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Abstract 


This  paper  examines  Ways  in  which  the  addition  of  data  modeling  features 
can  enhance  the  capabilities  of  mathematical  modeling  languages,  and 
demonstrates  how  suph  integration  might  be  achieved  as  an  application 
of  the  embedded  lanjguages  technique  proposed  by  Bhargava  and  Kim- 
6rough'j'f4}^  Decision'-making,  and  decision  support  systems,  require  the 
representation  and  manipulation  of  both  data  and  mathematical  models. 


Several  data  modeling  languages  as  well  as  several  mathematical  mod¬ 
eling  languages  exist,  but  they  have  differences  sets  of  capabilities.  VVe 
motivate  with  a  detailed  example  the  need  for  the  integration  of  these 
languages.  We  describe  the  benefits  that  might  result,  and  claim  that 
this  could  lead  to  a  significant  improvement  in  the  functionality  of  model 
management  systems.  Then  we  present  our  approach  for  the  integration 
of  tnese  languages,  and  specify  how  the  claimed  benefits  are  realized. 


Tfiis  author  s  work  on  this  paper  was  performed  in  conjunction  with  research  funded  by 
the  Naval  Postgraduate  School, 
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Introduction 


This  paper  examines  ways  in  which  the  addition  of  data  modeling  features 
can  enhcince  the  capabilities  of  mathematical  modeling  systems,  and  presents  a 
methodology  for  integrating  data  and  mathematical  modeling  languages.  Our 
research  is  based  on  the  recognition  that  decision-making,  and  decision  support 
systems,  require  the  representation  and  manipulation  of  both  data  and  math¬ 
ematical  models  (see  e  g.,  [42]).  Research  in  database  systems  has  led  to  the 
development  of  several  data  modeling  languages  (e.g.,  languages  based  on  the 
semantic  data  model  [29]).  Similarly,  several  languages  have  been  proposed  for 
mathematical  modeling  as  well  (e.g.,  algebraic  modeling  languages  [21]).  How¬ 
ever,  these  two  sets  of  languages  have  usually  been  developed  independently  of 
each  other,  and  have  various  differences  in  their  capabilities.  Data  modeling  lan¬ 
guages  have  few  features  for  representing  mathematical  relationships  between 
elements  of  the  domain,  while  mathematical  modeling  languages  lack  many  of 
the  facilities,  found  in  data  modeling  languages,  for  representing  qualitative 
relationships  between  these  elements. 

There  is  a  consensus  among  researchers  (e.g.,  [6,  4,  27,  16])  that  decision 
support  and  modeling  systems  should  support  the  entire  modeling  life-cycle.  In 
recent  years  there  has  been  much  research  on  the  model  management  component 
of  a  decision  support  system,  aimed  at  model  representation  and  manipulation. 
This  work  has  been  based  on  several  different  approaches  such  as  structured 
modeling  [27,  26],  graph-based  modeling  [31],  embedded  languages  [6,  4],  and 
executable  modeling  languages  [21,  20,  9,  4,  24].  These  approaches  have  led  to 
the  development  of  several  model  representation  languages,  such  as  SML  [24] 
and  LSM  [12]  (for  structured  modeling),  and  [6,  4]  (in  the  embedded 
languages  approach),  and  AMPL  [20]  (an  executable  modeling  language).  It 
has  resulted  in  several  general  modeling  systems,  including  FW/SM  [23]  for 
structured  modeling,  TEFA  [6] — based  on  embedded  languages,  NETWORKS 
[31] — a  graph-based  modeling  system,  and  GAMS  [9] — based  on  an  algebraic 
executable  modeling  language,  as  well  as  special-purpose  systems  to  support 
specific  modeling  activities.  For  example,  ANALYZE  [28]  supports  understand¬ 
ing  and  analy'-’s  of  linear  programming  models  and  solutions,  and  LPFORM 
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[35]  supports  the  formulation  of  linear  programs. 

These  approaches  have  emphasized  the  development  of  mathematical  mod¬ 
eling  capabilities,  but  have  largely  ignored  data  modeling  features,*  e.g.,  the 
representation  of  set-theoretic  structural,  qualitative  relationships^  that  exist 
cimong  the  elements  being  modeled.  Such  relationships  influence  the  modeling 
process,  and  the  mathematical  models  these  systems  represent,  in  several  ways. 
One,  many  potential  users  of  mathematical  modeling  techniques  find  it  easier 
to  conceptualize  a  problem  in  terms  of  the  data  modeling  relationships  rather 
than  the  mathematical  relationships  f[32].  Two,  problem-specific  data  is  re¬ 
quired  for  the  solution  of  these  models — it  is  desirable  to  access  this  data  from 
an  existing  database  rather  than  to  create,  and  maintain,  a  copy  of  the  data 
for  the  solver  being  used.  Three,  the  structural  and  qualitative  relationships 
are  often  the  foundations  of,  and  the  assumptions  underlying,  the  mathematical 
relationships  that  approximate  the  real  problem.  The  ability  to  represent  and 
reason  with  the  justifications  for  the  mathematical  formulation  is  particularly 
useful  in  model  formulation,  in  model  communication,  in  understanding  model 
solutions,  and  in  model  maintenance  [40]. 

While  several  languages  (e.g.  semantic  data  modeling  languages)  do  exist  for 
the  representation  of  such  semantic  relationships,  these  have  few  mathematical 
modeling  capabilities.  There  is,  therefore,  a  need  to  integrate  these  two  sets  of 
capabilities  within  a  single  framework.  One  way  to  achieve  such  integration  is 
to  create  a  new  unifying  conceptual  framework  and  modeling  lang'  age,  as  has 
been  done  in  the  case  of  structured  modeling.  Another  is  to  pro’  .''e  systematic 
means  for  integrating  existing  languages,  thus  allowing  their  users  to  continue 
using  those  languages.  Our  research  adopts  the  latter  alternative.^ 

'Structured  modeling  is  sui  exception  to  this  statement,  but,  as  we  argue  in  the  sequel,  our 
approach  is  fundamentally  different  from  that  taken  in  struct  ued  modeling. 

^These  include  what  are  often  termed  abstracUon  relat\.nship5  (aggregation,  generaliza¬ 
tion,  grouping,  specialization)  as  well  as  other  qualitati'  e  relationships  (e.g.,  a  supply  rela¬ 
tionship  between  a  set  of  plants  and  a  set  of  customer ■>)  between  model  elements,  which  are 
reducible  to  set-theoretic  operations. 

^There  is  another  fundamental  distinction  bel'.ren  our  approach  for  such  integration  and 
th^a  of  structured  modeling.  In  structured  nocleling,  the  design  of  the  relational  scheme 
(elemented  tables)  is  customized  to  meet  the  input  and  output  requirements  of  a  particular 
structured  model.  (In  fact,  the  normalized  tables  can  be  generated  automatically  from  model 
schemas.)  This  relational  scheme  could  either  define  tables  that  actually  store  the  elemental 
data,  or  could  define  a  view  of  other  existing  tables.  In  our  approach,  the  data  modeling  can 
be  entirely  independent  of  the  daf  >  requirements  of  individual  mathematical  models.  The 
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The  rest  of  this  paper  is  organized  as  follows.  In  §2  we  discuss  the  principal 
benefits  (from  the  perspective  of  mathematical  modeling  systems)  that  should 
accrue  from  the  integration  of  mathematical  and  data  modeling  features,  and 
illustrate  these  benefits  with  an  example.  In  §3,  we  explain  how  such  integration 
can  be  achieved  in  a  systematic  and  generalizable  manner.  W’e  propose  that 
the  embedded  languages  technique,  proposed  by  Bhargava  and  Kimbrough  [4], 
is  a  useful  technique  for  achieving  our  objectives.  We  use  this  technique  to 
integrate  a)  a  generic  executable  mathematical  modeling  language  Lm,  and 
b)  an  executable  data  modeling  language  id.  within  a  common  language.  In 
§4  we  illustrate,  using  our  earlier  example,  how  the  potential  benefits  discussed 
ill  §■-  are  realized,  and  how  the  functionality  of  mathematical  modeling  systems 
can  he  improved  as  a  result.  The  final  section  (§.5)  discusses  the  contribution  of 
our  work  and  suggests  directions  for  further  research. 

2  Motivation 

In  this  section  we  present  our  motivation  for  developing  a  language  that  inte¬ 
grates  data  and  mathematical  modeling  features.  We  do  so  by  arguing  that  cer¬ 
tain  desirable  features  can  be  implemented  in  modeling  languages  and  systems 
only  if  the  modeling  language  is  able  to  represent  both  data  and  mathemati¬ 
cal  relationships.  We  begin  by  considering  a  problem  faced  by  designers  of  a 
telecommunication  network  for  a  hypothetical  firm. 

Example  1  Communtcations  Network  Design 

Host  computers  and  terminal  controllers  are  to  be  connected  to  con¬ 
centrators.  The  terminals  are  partitioned  into  various  clusters,  with 
each  cluster  being  controlled  by  a  terminal  controller.  A  host  com¬ 
puter  may  also  serve  as  a  terminal  controller.  For  most  design  pur¬ 
poses,  host  computers  and  terminal  controllers  are  equivalent,  and 
are  considered  customer  sites.  The  telecommunications  network  con¬ 
sists  of  connections  between  these  customer  sites  and  concentrators. 

The  concentrators  and  customer  sites  are  called  network  elements 

relationships  between  the  data  stored  in  the  database  and  the  inputs  or  outputs  of  the  models 
are  captured  by  explicitly  declared  mappings. 
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Each  site  must  be  served  by  exactly  one  concentrator,  though  the 
same  concentrator  may  serve  various  sites.  The  average  load  (the 
data  traffic  to  and  from  the  host)  offered  by  each  host  computer  and 
each  terminal  is  known,  and  is  measured  in  bits  per  second.  For 
a  terminal  cluster,  the  sum  of  the  loads  at  all  the  terminals  in  the 
cluster  is  regarded  as  the  load  for  that  cluster.  There  is  cin  upper 
limit,  called  the  maximum  bandwidth,  to  the  data  flow  that  can  be 
handled  by  each  concentrator. 

The  existing  links  between  customer  sites  and  concentators  are 
of  two  types;  letised  links,  and  owned  links.  An  existing  link  is 
"valid”  if  and  only  if  the  customer  site  and  concentrator  it  connects 
are  compatible,  i.e.,  they  support  the  same  protocol.  In  general,  the 
cost  of  using  an  owned  link  is  proportional  to  the  speed  of  the  link. 

The  cost  of  using  a  leased  link  is  a  non-linear  function  of  the  traffic 
on  the  link.  However,  we  consider  a  simplified  scenario  in  which  all 
link  usage  costs  are  constant,  irrespective  of  the  type  of  link.  There 
are  also  fixed  set-up  costs  associated  with  locating  and  operating 
concentrators. 

The  objective  is  to  develop  a  linkage  and  location  plan  that  sat¬ 
isfies  the  load  requirements  of  customer  sites  at  a  minimum  cost. 

The  problem  of  developing  the  linkage  and  location  plan  can  be  formulated 
as  a  mathematical  programming  model,  as  represci.it  d  below  (the  cost  of  using 
“invalid"  links  is  set  to  infinity). 


Minimize 

(1) 

i€C;€5  >eC 

=  1 

Vj  G  5 

(2) 

•ec 

<  Z.A'. 

Vi  e  c 

(3) 

A,;  ,  Z, 

€  {0,1} 

(4) 

where 
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2.1  Problem  Conceptualization 

Consider  a  fragment  of  the  data  model  for  example  1  shown  in  Figure  1.  The 
information  about  the  problem  contained  in  this  model  can  be  explained  infor¬ 
mally  as  follows  (formal  definitions  of  the  italicised  terms  are  given  in  §3).  The 
nodes  “Concentrators,”  “Sites,”  and  “Terminals”  represent  the  set  of  con¬ 
centrators,  customer  sites,  and  individual  terminals,  respectively.  The  nodes 
“Owned-links”  and  "Leased-links”  represent  the  set  of  owned  and  leased  links 
that  exist  between  the  concentrators  and  sites.  Each  of  these  link  nodes  is  an 
aggrtgalion  (a  Cartesian  product)  of  concentrators  and  customer  sites.  The 
node  “Links,”  which  represents  all  the  available  links  between  concentrators 
and  sites,  is  a  generalization  of  these  two  link  nodes.  The  node  “Serves”  is  a 
specialization  of  links,  and  represents  those  links  that  are  operated  to  serve  sites 
from  concentrators,  A  “Network-element”  is  a  generalization  of  concentrators 
and  sites.  “Host-computers”  and  “Controllers”  are  specializations  of  sites. 
The  node  “Clusters”  is  a  jTOup»nj-o/ terminals,  and  each  cluster  controls  a 
terminal  controller. 

This  fragment  of  the  data  model  is  useful  in  conceptualizing  the  original 
problem  since  it  captures,  explicitly  tind  directly,  essential  information  about  the 
problem.  Of  course,  we  must  note  that  certain  of  ti  nformation  captured  in  the 
data  model  could  also  be  represented  in  mathematical  modeling  languages.  For 
example,  the  existence  of  network  elements,  customer  sites  and  concentrators, 
cind  of  the  linkage  between  them,  is  represented  in  AMPL  as: 

•  set  A'";  set  of  network  elements. 
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Figure  1:  CommunicaiionsQnetwork  design  data  model 


•  set  C\  set  of  concentrators, 

•  sei  S',  set  of  customer  sites. 

•  sei  T',  set  of  termincil  controllers. 

•  set  Ti',  set  of  host  computers, 

•  set  O',  set  of  available  owned  links  between  concentrators  £ind  customer 
sites. 

•  sei  .M;  set  of  available  leased  links  between  concentrators  and  customer 
sites. 

However,  since  the  mathematical  formulation  deals  with  only  customer  sites 
and  concentrators,  it  is  unlikely  that  A',  T,  H,  O,  and  M  would  be  mentioned 
at  all  in  the  representation.  While  EMLs  provide  an  excellent  representation 
of  the  information  necessary  for  solving  a  given  mathematiccd  formulation  for  a 
problem,  they  fall  short  in  capturing  other  information  relevant  to  the  original 
problem.  We  illustrate  this  by  considering  a  few  other  aspects  of  the  problem. 

1.  Both  concentrators  and  customer  sites  are  network  elements.  (A  network 
element  is  a  generalization  of  concentrators  and  customer  sites.) 

2.  There  are  two  kinds  of  customer  sites,  hosts  and  terminal  controllers. 
These  two  kinds  are  not  mutually  exclusive,  since  a  host  computer  can 
also  be  a  terminal  controller.  (Host  computers  and  terminal  controllers 
are  both  a  specialization  of  customer  sites.) 

3.  There  is  a  link-cost  associated  with  both  owned  links  and  leased  links 
between  concentrators  and  customer  sites.  There  is  also  a  link-speed  as¬ 
sociated  with  each  owned  link  between  concentrators  and  sites. 

4.  A  cluster  is  a  group  of  terminals. 

These  specific  items  of  information  about  the  problem  are  captured  ade¬ 
quately  using  constructs  of  a  data  modeling  language  (see  Figure  1),  but  can 


not  be  represented  adequately  in  existing  mathematical  modeling  languages,  as 
discussed  below.^ 

First,  note  that  both  the  generalization  and  specialization  relationships  (items  1 
and  2  above)  are  represented  in  EMLs,  such  as  AMPL  or  GAMS,  using  the  same 
set-theoretic  operator: 

•  j\f  —  C  U  S,  and 

•  n  c  s,T  c  s,  s  =  n  UT. 

Due  to  this  semantic  overloading  of  the  set  union  operator,  the  qualitative  dis¬ 
tinction  between  these  two  kinds  of  relationships  is  lost.  Second,  the  attributes 
of  the  relationships  between  objects  (e.g.,  link-speed,  link- cost — item  3  above) 
can  be  represented  in  EMLs  only  by  using  indexed  variables  (where  the  index 
sets  represent  the  objects),  whereas  data  modeling  languages  directly  represent 
these  as  attributes  of  the  relationships.  In  the  data  model,  link-speed  is  rep¬ 
resented  as  an  attribute  of  the  relationship  owned-ltnks  between  concentrators 
and  sites,  and  link-cost  is  an  attribute  of  links  (see  Figure  1).  In  the  EML  rep¬ 
resentation,  however,  the  indexed  variables  Cij  do  not  convey  information  as 
to  which  relationship — owned-link  or  leased-link — they  are  attributes  of.  Third, 
the  grouping  relationship  (item  4)  has  no  adequate  counterpart  in  mathematical 
modeling  languages  [25]. 

In  general,  the  evolution  of  semantic  data  modeling  languages  has  been 
guided  by  the  need  to  provide  constructs  for  the  direct  and  explicit  represen¬ 
tation  of  structural  relationships  between  objects.  Thus,  the  inclusion  of  data 
modeling  constructs  in  languages  for  mathematical  modeling  should  facilitate 
problem  conceptualization.  For  example,  in  a  successful  application  of  manage¬ 
ment  science  techniques  to  the  scheduling  of  the  1992  Olympic  games,  Andreu 
and  Corominas  begin  by  developing  an  entity-relationship  data  model  of  the 
problem  [3].  Fourer  argued  that  an  algebraic  representation  of  the  mathematical 
formulation  reduces  problems  of  verification,  modification,  and  documentation, 
is  more  readable  and  understandable,  makes  use  of  powerful  abstractions  com¬ 
monly  used  by  modelers,  and  is  independent  of  particular  algorithms  [21].  In 

'll  is  not  surprising  that  this  is  the  case,  since  such  information  is  not  required  to  solve 
the  model,  and  since  EMLs  primarily  aim  to  provide  an  alternative  (to  matrix  generators) 
representation  that  can  be  transformed  to  that  required  by  a  solver. 
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a  similar  way,  a  representation  of  the  structural  and  qualitative  relationships, 
using  data  modeling  constructs,  facilitates  verification,  modification,  and  doc¬ 
umentation  of  the  problem,  and  is  independent  of  a  particular  mathematical 
formulation  of  it.  This,  we  shall  see,  is  a  significant  consideration  .ii  model 
revision  and  version  management. 

2.2  Ensuring  Integrity  of  Data 

Consider,  in  example  1,  the  statement  An  existing  link  is  “valid"  if  and  only  if 
the  customer  site  and  concentrator  it  connects  are  compatible,  i.e.,  they  support 
the  same  protocol.  It  essentially  states  that  the  problem  data  on  available  links 
should  Include  no  pair  (i.j)  such  that  customer  site  i  and  concentrator  j  are  not 
compatible.  In  other  words,  the  sets  of  owned  links  and  leased  links  should  both 
be  subsets  of  the  set  of  elements  (i.j)  where  i  and  j  have  the  same  protocol.  For 
any  invalid  (i.j)  pair,  the  cost  of  using  that  link  is  assumed  to  be  infinite.  This 
information  is  represented  in  a  data  modeling  language  as  integrity  constraints, 
and  is  enforced  via  statements  equivalent  to  the  expressions  below. 

OUM  C  {{i,j)  :  i  €  C.j  €  5,  protocolf  i)  =  proiocol(j)}  (5) 

((*•  j)  ^  O  U  M)  ^  (link-cost((i,  j))  =  oc)  (6) 

Note  that,  viewed  by  the  user  of  an  E.ML,  this  is  a  constraint  on  the  "input” 
data  for  the  model,  and  is  thus  a  pre-processing  problem  (unlike  the  constraint 
Each  site  must  be  served  by  exactly  one  concentrator  which  is  a  constraint  on  the 
solution).  From  a  data  modeling  viewpoint,  however,  this  is  simply  an  integrity 
constraint  on  the  data — it  might  constrain  the  inputs  for  one  model,  and  the 
solution  of  another.  In  general,  data  moaeling  languages  and  database  systems 
emphasize  features  for  ensuring  integrity  of  data.  This  (i.e.,  for  input  data)  is 
typically  not  considered  a  function  of  a  mathematical  modeling  language — it  is 
assumed  that  the  integrity  of  problem-specific  data  has  been  ensured  externally.® 

^Recent  extensions  to  .^.MPL  do  allow  modelers  to  declare  a  "check"  statement  to  ensure 
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2.3  Model  Formulation 


Consider  the  modeling  variables  in  the  network  design  problem  of  example  1. 
flach  variable  used  in  the  mathematical  programming  formulation  represents  a 
problem-specific  concept.  For  instance,  the  variable  X,j  denotes  the  existence 
(or  lack  of  it)  of  a  link  between  concentrator  i  and  customer  site  j.  In  the 
data  model  for  this  problem,  the  aggregation  node  links  represents  the  concept 
of  concentrator-site  linkage,  which  concept  causes  the  inclusion  of  the  variable 
A'ij  in  the  mathematical  formulation.  Similarly,  consider  the  constraint  -^'i; 
<  1.  This  constrciint  is  derived  from  the  statement  Each  site  must  he  serced  by 
eiacily  one  concentrator,  though  the  same  concentrator  may  sene  various  sites 
in  the  problem  description.  In  the  data  model,  this  statement  is  represented  in 
the  functional  dependency  between  sites  and  concentrators: 

sites"^^’  concentrators.  (7) 

This  component  of  the  data  model  serves  to  justify  the  presence,  and  specific 
form,  of  this  constraint  in  the  mathematical  model.  This  is  depicted  in  Figure  2 
which  shows  the  relevant  fragment  of  the  justification  network  for  this  model. 
In  general,  in  formulating  a  model,  one  identifies  the  modeling  variables  and 
specifies  the  relationships  between  them.  Each  of  these  components  is  intro¬ 
duced  by  the  modeler  to  represent  some  aspect  of  the  problem  being  modeled. 
Previous  studies  in  model  formulation  [39,  40]  have  found  that  expert  modelers 
explain  their  formulations  by  relating  components  of  the  model  to  the  objects 
and  relationships  in  the  problem  statement.  Since  the  data  model  is  a  qualita¬ 
tive  representation  of  a  problem,  components  of  the  mathematical  model  can  be 
justified  in  terms  of  some  element(s)  of  the  data  model.  Thus,  the  information 
formalized  in  the  data  model  serves  two  related  purposes  in  the  formulation  of 
the  mathematical  model.  First,  this  information  is  useful  in  the  creative  part  of 
model  formulation,  such  as  in  introducing  a  new  variable  or  a  new  constraint. 
Second,  this  information  is  useful  in  justifying  components  of  the  mathematical 
model,  and  serves  as  useful  active  documentation  of  the  same. 
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Figure  2:  Fragment  of  the  justification  network 


2.4  Model  Reformulation  and  Version  Management 

Now  consider  the  following  modification  to  the  problem  being  modeled.  In  the 
original  statement,  we  considered  a  scenario  in  which  “all  link  usage  costs  [were] 
constant,  irrespective  of  the  type  of  link.”  Consider,  now,  a  scenario  in  which 
the  link  usage  costs  are  measured  more  accurately.  First,  we  must  distinguish 
between  the  cost  of  using  owned  links  (call  it  C'^)  and  the  cost  of  using  leased 
links  (call  it  C”).  Second,  the  C'j’s  should  be  computed  as  some  function  g  of 
the  link  speed,  and  the  C"  ’s  should  be  estimated  using  a  model,  which  specifies 
a  non-linear  functional  relationship  between  the  cost  and  volume  of  traffic  on  a 
link: 


Clj  =9(S,)  (8) 

c;;  =/(L;,<t2(i,))  (9) 

where  Sj  is  the  speed  of  link  j,  Lj  is  the  average  load  on  host  j,  cr{Lj)  is  the 
standard  deviation  of  the  load  on  host  j,  and  /  is  a  specified  non-linear  function. 
The  cost  function  C,j  can  now  be  written  as 


r  -  !  if  the  link  (ij)  is  owned  . 

I  if  the  link  (ij)  is  leased 

These  modifications  result  effectively  in  a  new  mathematical  formulation, 
with  a  non-linear  objective  function.  However,  note  that  the  underlying  data 
model  is  still  the  same,  and  that  the  new  expressions  in  the  formulation  can 
still  be  justified  by  components  of  the  original  data  model.  In  fact  these  modi¬ 
fications  are  the  consequences  of  considering  attributes,  which  were  previously 
ignored,  already  present  in  the  data  model. 
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For  a  given  problem,  several  models  may  be  formulated  and  explored  in  the 
course  of  model  development.  We  refer  to  each  distinct  model  that  results  from 
such  formulations  as  a  version.  Each  version  is  the  result  of  making  certain 
assumptions  about  the  problem.  In  the  current  example,  we  replaced  the  as¬ 
sumption  about  constant  link  usage  costs  for  ail  links  with  one  where  there  were 
different  costs  for  lecised  and  owned  links.  The  effect  of  the  new  assumption  was 
to  replace  the  total  usage  cost  expression  ® 

usage  cost  expression  (expressions  8-10).  W'hen  model  formulations  are  large  or 
complex,  it  is  a  non-trivial  task  to  identify  and  replace  model  components  that 
are  affected,  directly  or  indirectly,  by  changes  in  assumptions.  However,  this  can 
be  facilitated  by  examining  the  justifications,  in  terms  of  elements  of  the  data 
model,  for  each  component  of  the  mathematical  model.  A  model  component 
that  does  not  have  at  least  one  justification  can  be  retracted  as  it  lacks  support 
in  the  version  being  created,  example?  maybe 

Further,  comparing  the  original  and  new  formulation  for  the  communica¬ 
tions  network  problem,  we  see  that  the  models  share  a  great  deal  of  structure: 
apart  from  their  objective  functions  they  are  the  same  models.  One  plausible 
approach  to  reformulation  is  to  start  with  the  original  model  and  its  underly¬ 
ing  justifications,  and  to  selectively  make  the  changes  required  to  create  a  new 
version.  The  original  data  model  (or,  in  general,  a  substantial  part  of  it)  can  be 
re-used  to  justify  and  document  the  new  formulation.  Thus,  the  data  model  and 
the  associated  justifications  support  this  kind  of  version  creation,  by  promoting 
re-use  of  previous  modeling  effort. 

W^hen  several  model  versions  are  created,  one  may  lose  track  of  the  similari¬ 
ties  and  differences  betwen  versions.  Software  tools  are  available  for  mitigating 
this  problem.  For  instance,  if  each  version  is  stored  in  a  file,  an  operating  system 
utility  (such  as  di/ in  the  UNIX  system)  can  be  used  to  determine  differences 
in  terms  of  lines  present  in  one  file  that  are  absent  in  another.  In  the  net¬ 
work  design  example,  dif  may  be  used  to  determine  that  the  objective  function 
in  one  version  is  XTiecHyes while  the  objective  function  in  another  is 
H.ec  +  /(ij  However,  such  utiities  lack  means  to  pro¬ 

vide  reasons  as  to  why  these  objective  functions  are  different.  The  data  model 
and  associated  justifications  can  be  used  to  do  that  and  much  more,  as  shown 
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Figure  3:  Fragment  of  version  graph:  Communications  network  design  data 
modi'l 
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below. 

Consider  the  version  graph  for  the  communications  network  problem.  Each 
node  in  this  graph  represents  a  model  version  for  the  problem.  A  directed  arc 
from  a  node  V'l  to  node  indicates  that  version  V2  was  obtained  by  modifying 
version  V'l.  The  labels  on  the  arcs  represent  the  assumptions  that  were  deleted 
and  /  or  added  in  order  to  move  from  Vi  to  V'2.  A  fragment  of  this  graph  is 
shown  in  Figure  3.  Thus,  this  graph  captures  the  sequence  in  which  various 
versions  were  developed,  the  differences  between  versions,  21s  well  as  the  reasons 
for  developing  new  model  versions.  It  documents  the  chronologj’  of  the  model 
development  process,  and  can  be  a  useful  repository  of  modeling  experience. 
Complex  models  can  be  hard  to  solve,  and  successful  modelers  often  evolve  so¬ 
lution  strategies  by  developing  several  versions  (37).  They  solve  relaxed  models, 
selectively  increase  the  complexity  by  introducing  new  assumptions,  and  use  the 
solutions  from  the  relaxed  model  to  solve  the  new  model  version.  For  a  given 
problem,  clues  about  a  useful  solution  strategy  can  be  obtained  by  examining 
the  chronology  of  successful  model  development  processes  for  related  models. 

2.5  Discussion 

W’e  have  illustrated  several  benefits  that  would  result  from  integrating  features  of 
data  modeling  languages  into  mathematical  modeling  languages.  Put  together, 
we  believe  they  make  a  convincing  ctise  for  providing  such  integration  In  our 
view,  one  of  the  most  significant  implications  of  such  integration  is  that  it  allows 
a  modeler  to  document  justifications,  in  terms  of  elements  of  the  data  model,  for 
various  components  of  a  mathematical  model.  There  are  several  benefits  that 
follow  from  the  availability  of  such  justifications.  These  include  improvements 
in  a  modeling  system’s  ability  to  explain  and  communicate  the  model,  to  track 
changes  to  models,  to  explain  similarities  and  differences  between  various  ver¬ 
sions  of  a  model,  and  to  examine  consistency  of  a  model  formulation  in  terms 
nf  its  justifications.  Having  made  the  case  that  such  integration  is  desirable, 
how  do  we  achieve  it?  We  present  our  approach  to  the  integration  of  data  and 
mathematical  modeling  languages  in  the  next  section. 
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3  A  Formalism  for  Language  Integration 


Ou  method  for  language  integration  is  based  on  the  embedded  languages  tech¬ 
nique  developed  by  Bhargava  and  Kimbrough  [4],  The  embedded  lauiguages 
technique  provides  a  systematic  means  for  integrating  multiple  “embedded” 
languages  within  a  single  “embedding”  language.  In  our  context,  one  embed¬ 
ded  language  is  a  generic  data  modeling  language  and  the  other  embedded 
language  is  a  generic  mathematical  language  L„. .  Following  the  convention  of 
Bhargava  and  Kimbrough,  the  embedding  language  will  be  called  LF 

3.1  A  Generic  Data  Modeling  Language 

VV’e  begin  by  formali.sing  a  generic  semantic  data  modeling  language,  which  we 
call  Ld-  The  language  supports  data  modeling  constructs  (aggregation,  group¬ 
ing,  genercilization,  and  specialization)  as  a  means  for  the  direct  and  explicit 
representation  of  structural  relationships  between  data  elements.  Our  develop¬ 
ment  of  the  language  is  based  on  the  set  theoretic  development  described  in  [29]. 
The  reader  may  find  it  useful  to  refer  to  Figures  1  and  4.  where  we  represent 
pictorially  and  textually,  respectively,  the  data  model  for  the  communications 
network  model  (example  1), 

In  a  data  modeling  language,  the  real-world  is  conceptualized  as  a  collection 
of  objects  and  relationships  between  these  objects.  An  ohjcc*  type  is  a  set  of 
objects.  For  any  object  type  .4  we  will  denote  the  set  of  objects  that  it  represents 
by  A.  The  specification  of  an  object  type  may  be  either  primitive  or  compound. 
A  semantic  data  modeling  language  provides  a  set  of  abstraction  relationships 
that  are  used  to  specify  compound  object  types  in  terms  of  other  object  types, 
as  well  as  to  capture  relationships  between  object  types. 

Definition  1  Primitive  Specification  of  an  Object  Type 

A  primitive  specification  of  an  object  type  consists  of  a  definition  for 
the  object  type  as  a  set  of  objects  drawn  from  a  collection  of  known 
domains.  The  commonly  occurring  domains  are  the  sets  of  reals, 
integers,  boolean,  and  strings.  Other  domains,  subsets  of  these, 
may  be  defined  by  users  of  the  language. 
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##  PRIMITIVE  OBJECT  TYPES  ## 

object-type(tenninals) ;  #  individual  terminals 

object-type(sites) ;  t  customer  sites 

object-type(controllers) ;  #  terminal  controllers 


##  COMPOUITD  OBJECT  TYPES  (SEMANTIC  RELATIONSHIPS)  ## 
aggregation-ol ( [concentrators ,  sites] ,owned-links) ;  t 

ovned-links 

aggregation-of ( [concentrators ,  sites] .leased-links) ;  U 

leased-links 

aggregation-ol ( [clusters ,  controllers] ,controlled-by) ;  # 

controllers  ol  clusters 

group-ol (terminals , clusters ) ;  # 

clusters 

generalization-ol ( [sites  ,  concentrators] .netoork-elements) ;  # 


netBork-elements 

specialization(oHned-links ,  links) ; 

•links 

specializationdeased-links  .links) ;  # 

links 

specialization-ol (sites .hosts ) ;  # 

host  computers 

specialization-oKsites  .controllers) ;  # 

concentrators 

specialization-ol (links .serves) ;  • 

serves 

#«  FUNCTIONS  (ATTRIBUTES)  ## 
range-ol( setup-cost (concentrators) .reals) 
range-ol( operation-cost (concentrators) .reals) 
rzmge-ol (max-bandwith( concentrators) .integers) 
range-ol(link-speed(oHned-links) .reals) 


Figure  4:  Data  model  for  Communications  network  design  expressed  in  Lj 


16 


Example  2  Pnmiiivt  Object  Type:  Host  Computers 

The  collection  of  host  computers,  labelled  hosts,  has  a  primitive 
specification  in  our  example.  The  object  type  hosts  is  defined  as  a 
collection  of  objects  that  denote  specific  host  computers,  say  host-1, 

. . .,  host-n.  The  terms  host-1,  ... ,  host-n  are  drawn  from  the  domain 
of  strings.  In  Lj  this  specification  is  achieved  by  statements  such  as 
the  following: 

object(host-l) 
object-type (hosts) 
element-of (host-1 ,  hosts) 

which  represent  that  hosts  is  am  object-type  and  the  object  host-1 
belongs  to  that  type. 

Definition  2  Compound  Specification  of  an  Object  Type 

A  compound  specification  of  an  object  type  consists  of  a  definition 
of  that  object  type  in  terms  of  other  object  types  and  one  of  the 
following  abstraction  relationships:  aggregation,  generalization,  spe¬ 
cialization,  and  grouping. 

Each  of  these  abstraction  relationships  and  its  use  in  object  specification  is 
discussed  below. 

Definition  3  Aggregation 

The  aggregation  of  a  set  of  object  types  , ....  An  is  an  object  type 
A  such  that  the  set  of  objects  represented  by  A  is  a  subset  of  the 
Cartesian  product  of  the  sets  of  objects  represented  by  Aj, . .  . ,  An, 
i.e., 


AC(g)A. 

1=1 


Example  3  Compound  Specification:  Aggregation 
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The  object  types  oumed-ltnks  and  leased-ltnks  which  represent  con¬ 
nections  between  concentrators  and  customer  sites  (see  example  1) 
are  both  aggregations  of  concentrators  and  sites.  These  are  a  collec¬ 
tion  of  2-tuples  of  the  form  (c,  s)  where  c  is  a  concentrator  and  s  is 
a  customer  site.  This  information  is  represented  in  the  data  model 
by  the  aggregation  nodes  ovuned-link  a.nd  leased-link,  and  is  specified 
in  the  language  as  follows: 

aggregation-of ( [concentrators,  sites] .owned-links ) 
aggregation-oi( [concentrators,  sites] ,leased-links) 
element-oi (<concentrator-l ,host-l> ,  owned-links ) 
element -of (<concentrator-2 , controller-3> ,  leas ed-1 inks ) 

These  statements  represent  that  ou-ned-Unks  and  Icased-links  are 
object  types  formed  by  aggregating  concentrators  and  sites,  that  a 
specific  owned  link  is  the  link  between  concentrator- 1  and  host-], 
and  that  a  specific  leased  link  is  the  link  between  concentrator- 2  and 
controller-3. 

Definition  4  Grouping 

The  grouping  over  an  object  type  A  is  an  object  type  B  such  that 
the  set  of  objects  represented  by  B  is  a  power  set  of  the  set  of  objects 
represented  by  A.  Thus,  any  subset  of  .4  is  an  object  of  type  B.  and 

B  =  {S:SCA} 

Example  4  Compound  Specification:  Grouping 

A  terminal  controller  unit  controls  a  cluster,  i.e.,  a  collection,  of 
terminals.  A  particular  cluster  is  some  subset  of  the  set  of  objects 
of  type  terminals.  The  object  type  clusters  is  represented  in  a  data 
model  as  a  grouping  of  terminals,  and  is  specified  in  the  language  as 
follows: 

group-of (terminals .clusters) 
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element-of ({terminal-l,  tenainal-2,  termiiial-3> ,  clusters) 
element-ol ({terainad-S , terminal-4} ,  clusters ) 

Definition  5  Spectalnatton 

A  specialization  of  an  object  type  A  is  an  object  type  B  such  that 
the  set  of  objects  represented  by  B  is  a  subset  of  the  set  of  objects 
represented  by  A,  i.e.,  B  C  A.  The  objects  of  type  B  inherit  the 
structure  of  the  objects  of  type  .4.  There  can  be  several  specializa¬ 
tions  of  an  object  t\pe,  and  these  specializations  are  not  required  io 
be  disjoint. 

Example  5  Compound  Specification:  Specialization 

In  example  1,  host  computers  and  clusters  of  terminals  are  equiv¬ 
alent,  and  are  referred  to  as  customer  sites,  and  a  host  computer 
may  also  serine  as  a  terminal  controller.  The  hosts  and  controllers 
are  represented  in  a  data  model  as  specializations  of  the  object  type 
sites,  and  are  specified  in  the  language  as  follows: 

special ization-oi (sites .hosts) 
special ization-oi (sites .controllers ) 
element-of (host- 1 .hosts) 
element-of (host-1 , controllers) 
element-of (concent rat or- 1 . concentrators) 

The  specialization  relationship  allows  the  intersection  of  the  sets 
hosts  and  controllers  to  be  non-null,  which  is  indeed  the  case  here 
since  host-1  is  both  a  host  computer  and  a  terminal  controller. 

Definition  6  Generalization 

A  generalization  of  a  set  of  object  types  .4i,...,,4„  is  an  object 
type  .4  such  that  the  set  of  objects  represented  by  A  contains  all  the 
objects  represented  by  A\, . . . ,  An,  i.e.. 


A  =  0  .4. 
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The  sets  Ai , . . . ,  An  are  assumed  to  be  pair-wise  disjoint. 

Ex^lmple  6  Compound  Specification:  Generalization 

In  example  1,  a  network  element  is  either  a  concentrator  or  a  host 
computer.  In  the  data  model,  the  object  type  network-elements  is 
a  generalization  of  the  object  types  concentrators  and  hosts,  and  is 
specified  in  the  language  as  follows: 

generalization-of ( [sites ,  concentrators] .network-elements) 
element-ol (host-5 .network-elements) 
element-oi (concentrator-1 .network-elements) 

The  abstractions  discussed  above  are  useful  in  capturing  some  of  the  com¬ 
monly  occurring  data  structures  and  relationships  between  data.  The  constructs 
specialization  and  generalization  capture  the  “hierarchical”  relationships  among 
objects  in  the  problem  domain,  while  aggregation  and  grouping  capture  the  “hor¬ 
izontal”  relationships.  In  addition,  functional  relationships  among  object  types 
are  represented  as  attributes  of  the  object  types  in  the  data  model.  For  example, 
the  cost  of  operating  concentrators  is  represented  as  an  attribute  operation-cost 
of  concentrators. 

To  summarize,  Lj  is  a  specialized  first-order  language  with  a)  an  open  vocab¬ 
ulary  of  individual  constants  representing  objects  and  object  types  in  tiic  data 
model,  b)  an  open  vocabulary  of  function  constants  representing  attributes  of 
objects  or  object  types,  and  c)  the  following  special  predicate  constants  (in  the 
notation  below.  A,  Ai, . . .  ,A„,  and  B  denote  individual  constants); 

•  object;  A  unary  predicate,  such  that  the  statement  object(A)  asserts  that 
A  is  an  object, 

•  object-type;  A  unary  predicate,  such  that  the  statement  object-type(A) 
asserts  that  A  is  an  object  type, 

•  element-of:  A  binary  predicate,  such  that  element-of(.4,  i?)  asserts  that 
the  object  A  is  an  element  of  the  object  type  B,® 

®Here.  and  elsewhere,  it  is  not  necessary  that  the  predicate  be  defined  exlensionally  in  Lj. 
For  ex^uTlple,  the  objects  belonging  to  a  certain  object  type  'night  be  declared  by  pointing, 
using  a  query  iangu£tge,  to  a  column  in  a  database  that  stores  the  objects  of  that  type. 
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•  aggregation-of:  A  binary  predicate,  such  that  aggregation-of([i4i, . . . ,  An],B) 
asserts  that  the  object  type  B  is  an  aggregation  of  the  object  types 
Ai, . .  ■  ,A„, 

•  grouping-of;  A  binary  predicate,  such  that  grouping-of(A,  B)  asserts 
that  the  object  type  B  is  formed  as  a  grouping  over  object  type  A, 

•  specialization-of:  A  binary  predicate,  such  that  specialization-of(A,  B) 
asserts  that  the  object  type  B  is  a  specialization  of  object  type  A, 

•  generalization-of:  A  binciry  predicate,  such  that  generalization-of([.4i , . . . ,  An],B) 
asserts  that  the  object  type  B  is  a  generalization  of  the  object  types 

, . . . ,  An ,  subtype(A,B)  declares  that  the  primitive 

•  function-range:  A  binary  predicate,  such  that  function-range(/(.4),B) 
represents  that  the  range  of  function  /  with  domain  A  is  B. 

As  a  simple  example,  if  Ld  were  to  be  a  relational  data  modeling  language, 
the  names  of  the  relation  schemes  would  be  object  types  in  Ld,  and  the  columns 
of  these  relations  would  be  function  constants  in  Ld- 

3.2  A  Generic  Mathematical  Modeling  Language  Lm 

For  our  purposes  in  this  paper,  any  of  the  existing  executable  algebraic  modeling 
languages  (such  as  AM  PL,  GAMS,  L^,  LINGO)  could  serve  as  the  mathemat¬ 
ical  modeling  language  Lm-  While  there  are  some  differences  between  these 
languages,  they  are  very  similar  in  the  bcisic  structure  and  in  the  characteristics 
that  we  are  concerned  with  in  this  paper.  Hence  instead  of  specifying  a  new 
language,  cr  of  illustrating  our  ideas  on  a  specific  language,  we  will  assume  a 
generic  modeling  langauge.  For  details  on  any  of  these  particular  languages,  we 
refer  the  reader  to  the  appropriate  references  mentioned  in  §1.  An  executable 
modeling  language  based  on  first-order  logic  is  discussed  in  [6].  Here,  we  restrict 
ourselves  to  a  simple  illustration  of  this  language,  by  representing  the  model  of 
example  1  in  an  AMPL-like  syntaoc  (see  Figure  5)  . 
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###  SETS  ### 

set  C;  #  Concentrators 

set  S;  #  customer  Sites  (hosts  and  controllers) 

#«»  PARAMETERS  «## 

param  cost  {C,S}  >0;  #  cost(i,j)  of  using  link  between 

concentrator  i  and  site  j 

paxeun  fcost  {C}  >0;  #  fired  setup  co3t(i)  of  locating  aind 

operating  concentrator  i 

pcLraun  load  {S}  >=  0;  #  load(j)  at  customer  site  j 

pairam  k  (C}  >0;  #  bandoidth(i)  of  concentrator  i 

###  VARIABLES  ### 

var  X  {C,S}  binary;  #  x(i,j)  =  1  if  concentrator  i  serves  site 
j ,  0  otherwise 

var  z  {i}  binary;  #  2(i)  =  1  if  concentrator  i  is  operated,  0 

otherwise 

###  OBJECTIVE  FUNCTIOM  ### 
minimize  total  cost; 

sum  {i  in  CXsum  {j  in  S}  (costCi.j]  ♦  x[i,j])  +  fcost[i]  * 

zCi]); 

###  CONSTRAIMTS  ### 


subject  to  linkages  {j  in  S}:  sum  ^i  in  C}  (x[i,j])  =  1; 

#  each  site  must  be  served  by  exactly 


concentrator 


1 


subject  to  capacity  {i  in  C}:  sum  {j  in  S}  (load[j]  *  x[i,j])  <= 
z[i]  *  k[i]  ; 

#  each  concentrator  (if  open)  has  a 

bandwidth  capacity 


Figure  5;  Communications  network  design  model  expressed  in  Lm 
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3.3  Integrating  Lj,  and  in 

In  [4],  Bhargava  eind  Kimbrough  developed  the  embedded  languages  technique 
and  explained  how  an  algebraic  modeling  language  (a  specialized  first-order 
logic  language,  L^)  is  embedded  in  L^.  The  embedded  languages  technique 
provides  a  systematic  and  rigorous  mezuis  for  integrating,  and  reasoning  about, 
multiple  (embedded)  languages.  In  that  framework,  an  embedded  language 
(L|) — which  models  a  semi-formal  target  language  (L^ — is  embedded  within 
an  embedding  language,  (L^),  which  is  used  to  represent  information  about 
formulas  and  terms  in  the  embedded  language,  and  to  translate  one  embedded 
language  into  another. 

Centred  to  the  embedded  languages  technique  is  the  idea  of  an  image  function 
2,  and  a  translation  function  T.  An  embedding  is  a  triple  (2,2’,  A),  where  A 
is  a  collection  of  formulas,  in  L^,  that  represents  the  rules  of  inference  and 
transformation  of  L|.  The  image  function  2  uniquely  maps  all  expressions — 
terms  and  formulas — in  L;  into  terms  in  Ll .  The  translation  function  2  uniquely 
maps  the  images  of  all  formulas  in  Lj  into  formulas  in  LC  Therefore,  in  order 
to  embed  Li  and  Lm  in  L^,  we  require  functions  2  and  2  such  that  a)  the 
well-formed  formulas  as  well  as  terms  of  Li  and  Lm  be  interpretable  as  terms 
in  L^,  and  that  b)  there  be  a  formula  in  corresponding  to,  and  making  an 
assertion  regarding,  each  formula  in 

Bhargava  and  Kimbrough  discussed  the  image  and  translation  functions  in 
detail  in  the  context  of  embedding  an  algebraic  modeling  language.  The  func¬ 
tions  are  developed  along  similar  lines  for  embedding  a  data  modeling  language. 
In  what  follows,  we  focus  on  the  predicates  that  are  required  to  relate  infor¬ 
mation  across  the  two  embedded  languages.  Of  these,  the  predicates  used  to 
represent  justification  networks  and  the  predicates  that  declare  the  belief  status 
of  the  components  that  are  the  nodes  of  the  networks  are  formalizations  of  the 
work  reported  in  [40]. 

To  begin  with,  assume  that  there  are  predicates  wff-Li  and  wff-im  in  V 
with  the  following  interpretation: 

•  wfF-L„,(2((/)))  states  that  <^i  is  a  wff  in  Lm 

•  ■wff- Li{I{xp))  states  that  <(>  is  a  wff  in  Li 
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data  model  is  not  mapped  to  any  variable  in  the  mathematical  model, 
that  might  suggest  that  a  new  variable  needs  to  be  introduced. 

4.2  Ensuring  Integrity  of  Data 

Consider,  again,  the  statement  Each  site  must  be  served  by  exactly  one  concen¬ 
trator.  It  implies  that  for  any  objects  ij,  tj,  and  j,  if  (ii,  j)  and  (12,  j)  are  both 
objects  of  type  serves  and  if  I'l  and  22  are  distinct,  then  there  is  a  problem  with 
the  data,  i.e., 


((jj.j)  €  serves)  A  ({>2,  j)  €  serves)  A  (ij  ^  22) 

A(2j  G  concentrators)  A  (fj  G  concentrators)  A  {j  G  sites) 

—  not-ok(((2i,  j)  G  serves  AND  (xN.  j)  £  serves))  (11) 

This  is  represented  by  the  following  formula: 


wfT-I,i(I(element-of((2j  ,j), serves)))  A  wfr-Z-j(I(element-of(  {22 .  j) ,  serves))) 

A(r(2i)  ^  1(22))  A  wfr-L<i(I(element-of(2i, concentrators))) 
AwfT-I,i(I(element-of(i2, concentrators)))  A  wff-Id(2’(elernent-of( j,  sites))) 

— *  not-ok(I(element-of({2j  ,j), serves))  .4ND  I(element-of(  (22,  j),  serves))()12) 

The  ability  to  make  such  statements  means  that  constraints  that  are  nor¬ 
mally  not  part  of  the  mathematical  formulation  can  now  be  included  in  the 
model  representation.  In  fact,  statements  of  this  sort  can  be  derived  from  a 
more  general  formula  (as  explained  below),  whicli  means  that  such  con¬ 
straints  can  be  enforced  simply  by  declaration  of  the  functional  dependencies  in 
the  data  model. 

Let  us  introduce  in  a  predicate  f-d  such  that  the  formula  f-d(I((?i ),  I(t ).  1(02)) 
means  that  there  is  a  functional  dependency  of  the  form  d>i-^<l>2  in  the  data 
model.  This  dependency  is  meant  to  ensure  that  elements 

(ii,j)  and  {22,  j)  can  both  belong  to  tj/  only  if  21  is  equal  to  12.  This  rule  is 
stated  in  as  indicated  below. 
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V2iVi2Vj(wfF-i£((I(elennent-of((ji,  j),^;i)))  A  wff-Z,^(I(element-of((i2,  j),  v))) 

A  wff-Ld(I(element-of(ji,  0i))) 
Awff-i<j(I(element-of(«2,  <^>1)))  A  wff-L(((I(element-of(j,  <;i)2))) 

— »  not-ok(I(element'of((ii,  j),  t/’)  AND  elerrient-of((i2,  j).  v')))))  (13) 

4.3  Model  Formulation 

Consider  the  example  in  §2.3  illustrating  the  relationship  between  the  problem 
statement  amd  its  mathematical  model.  The  statement  “Each  site  must  be 
served  by  exactly  one  concentrator,  though  the  same  concentrator  may  serve 
various  sites”  justifies  the  constraint  X,j  <  1.  It  is  stated  in  as  shown 
below. 


justifies(I(sites*^^*concentrators),I(^ -Y,j  <  l),i’l)  (14) 

t 

The  same  component  may  be  justified  in  different  ways.  These  distinct  justi¬ 
fications  are  referred  to  as  disjunctive  justifications.  When  several  components 
(elements  of  the  data  or  the  mathematical  model)  jointly  justify  another  com¬ 
ponent,  such  a  justification  is  a  conjunctive  justification.  .An  example  of  such 
a  justification  is  that  of  the  expression  -  ^'A’l  by  the  variables 

A',j,  Z, ,  A',,  and  Lj.  This  can  be  stated  in  using  list  []  notation  to  indicate 
conjunction. 


justifies([I(A,,),I(Z,).I(A-.),I(Lj)],I(^£,.Y.j  <  Z,A'.),vl)  (1.5) 

There  are  several  benefits  that  can  accrue  from  explicitly  declaring  justifica¬ 
tions  as  a  part  of  the  model  representation. 


•  The  justification  network  which  is  the  set  of  all  justfications  associated 
with  a  formulation  can  serve  as  an  active  documentation  of  the  model.  It 
can  be  queried,  and  thereby  can  promote  model  understandability.  The 
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queries  can  not  only  extract  information  explicitly  asserted  but  also  infer 
chains  of  justfications. 

•  Model  formulation  can  suffer  from  flawed  reasoning.  The  justfication  net¬ 
work  can  help  detect  such  flaws.  A  cycle  in  a  chain  of  justifications  associ¬ 
ated  with  a  component  indicates  such  a  flaw.  Cycle  detection  algorithms 
[2]  can  be  specified  in  the  language. 

4.4  Model  Reformulation  and  Version  Management 

The  justfication  networks,  with  some  extensions,  can  help  with  model  refor¬ 
mulation  and  in  version  management.  This  section  will  discuss  the  nature  of 
these  extensions,  and  how  they  can  fruitfully  assist  in  reformulation  and  version 
management. 

Model  reformulation  entails  changes  to  components.  All  components  directly 
or  indirectly  (i.e.,on  a  justification  path)  justified  by  a  changed  component  are 
affected.  For  example,  consider  the  assumption  that  link  usage  costs  depend 
only  on  the  (ij)  pair  they  are  an  attribute  of.  Now  suppose  we  replace  this 
with  an  assumption  that  they  are  also  a  function  of  link  type.  This  implies 
that  in  the  new  formulation  C,y  is  removed,  as  is  the  old  objective  function  it 
(jointly)  justifies.  It  is  replaced  by  new  cost  functions  for  leased  and  owned 
links,  and  a  new  objective  function  as  discussed  in  §2.4.  If  these  changes  were 
made  within  the  context  of  a  single  formulation,  then  the  justifications  could 
simply  be  altered  to  reflect  the  new  assumptions.  However,  what  should  be 
done  if  the  modeler  chooses  to  retain  the  original  model,  and  simply  investigate 
these  changes  in  the  context  of  a  new  version?  Now,  in  addition  to  declaring 
the  justification  relationships,  we  need  a  mechanism  to  declare  if  the  model 
components  used  in  the  justification  relationships  are,  or  are  not,  believed  in  a 
given  version.  In  our  example,  the  original  assumption  about  link  usage  cost  is 
believed  (declared  to  be  IN)  in  the  original  model  but  not  believed  (declared  to 
be  OUT)  in  the  new  version.  These  declarations  of  belief  are  stated  as  shown 
below. 


in-label(I(depends-only-on(C,y ,  (j,  j))),  rl)  (16) 
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out-label(I(depends-only-on(Cy ,  (i,  j))),  112) 


(17) 


The  first  assertion  states  that  the  belief  that  Cij  depends  only  on  the  (i  j)  pair 
and  not  link  type  (i.e.,  the  constaint  link  usage  cost  assumption)  is  IN  in  version 
vl,  the  label  used  to  refer  to  the  original  formulation.  The  second  eissertion 
states  that  this  belief  is  OUT  in  the  new  version  labelled  v2. 

With  these  representations  (i.e.,  justification  networks  and  belief  status) 
in  place,  specific  functions  can  be  defined  in  to  specifically  support  model 
reformulation  and  version  management.  We  give  two  specific  examples;  a)  a 
function  that  propagates  a  change  in  belief  status  of  a  model  component  (node) 
in  the  justification  network,  and  b)  a  function  that  computes  similarities  and 
differences  between  versions.  These  have  been  adpated  from  [39,  40]  where  a 
fuller  discussion  of  functions  that  can  be  used  to  support  reformulation  and 
version  management  can  be  found. 

When  the  belief  status  of  a  mode!  component  changes,  the  belief  status  of  all 
the  components  justified  directly  or  indirectly  by  it  can  change.  Two  functions, 
in-propagate  and  out-propagate,  are  specfied  inL^  to  manage  these  changes  in 
belief  status. 

function  in-propagate(Component,Ver) ; 

if  out-label(I(Component) ,V)  and  Ver  £  V 

then  in-label(I(Component) ,Ver) 

endif 

VC  justif ies(I(Component) , 1(C) , Ver) 
do  in-propagate(C , Ver) 
end  function  in-propagate 

When  the  belief  status  of  component  is  changed  to  IN,  the  belief  status  of 
all  the  model  components  that  are  directly  or  indirectly  justified  by  this  com¬ 
ponent  are  also  changed  to  IN.  The  justification  network  is  used  to  identify  the 
components  whose  belief  status  should  be  changed. 


30 


function  out-propagat«(Component ,Ver) ; 
if  in-labal (I (Component ) ,Ver) 
then  out-label(I(Component) ,Ver. V) 
endif 

VC  justif ies(I(Component) ,I(C) ,Ver) 
if  -^3K  justif  ies(I(K)  ,1(0) 
then  do  out-propagate(C,Ver) 
endif 

end  function  out-propagate 

When  the  belief  status  of  a  component  is  changed  to  OUT,  the  belief  status 
of  all  components  so/e/j;  justified  by  it  are  also  changed  to  OUT. 

Over  the  course  of  a  modeling  project,  several  model  versions  may  be  con¬ 
structed.  Some  of  these  versions  will  share  components.  For  instance,  different 
link  usage  cost  assumptions  resulted  in  two  distinct  versions  with  similar  con¬ 
straint  structures  but  different  objective  functions.  The  ability  to  enquire  about 
the  similarities  and  differences  between  versions  can  be  very  useful  wiien  a  mod¬ 
eler  works  with  multiple  versions.  The  function  detennine-jusiification  supports 
this  feature. 

function  determine-justif ication(Component,Ver) ; 
if  primitive-node(I(Component) ,Ver) 
then  return  Component 

else  let  S  be  the  list  of  components  such  that  V£  £  £ 
justif ies(I(E) ,I(Component) ,Ver)  amd  in-label(I(E) , Ver) 
r e ciurs i ve-de t ermine- just if icat ions (S,Ver) 
end  function  determine-justif ications 
function  recursive-determine-justification(Set , Ver) ; 
return  appendfdetennine-justificationff irst(Set) ,Ver) , 
recurs ive-determine-just if ication(rest (Set ) , ver ) 
end  function  recursive-determine-function 

The  functions  determine  the  justification  chain  associated  with  a  given  model 
component  in  a  version.  This  function  can  now  be  used  to  determine  similarities 
and  differences.  Consider  the  variable  C,;,  which  represents  the  link  usage  cost, 
in  version  vl  and  v2.  This  variable  is  a  component  whose  belief  status  is  IN 
in  both  versions.  However,  it  has  a  different  set  of  justifications  in  vT  (i.e., 
that  it  depends  only  on  the  (ij)  pair  it  is  an  attribute  of)  and  v2  (i.e.,  that 
'■  also  depends  on  link  type).  So  here  we  the  same  component  that  is  present 
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in  two  different  versions  being  justified  in  different  ways.  Using  the  determine- 
jusiificaUons  function,  the  above  mentioned  justifications  can  be  determined. 
Upon  comparison,  the  similarities  and  differences  can  be  uncovered. 

5  Conclusions 

In  this  paper  we  began  with  the  proposition  that  there  is  much  to  gain  by  mak¬ 
ing  data  modeling  language  constructs  available  to  modelers  using  an  algebriac 
modeling  language.  We  have  discussed  one  way  of  achieving  this,  namely  via 
an  embedded  languages  approach,  and  have  applied  it  to  define  several  useful 
modeling  support  functions. 

Specifically,  we  discussed  the  use  of  justifications  to  document  the  reasons 
for  a  particular  mathematical  formulation  in  terms  of  components  of  the  data 
model.  The  advantages  of  such  justifications  need  to  be  investigated  further. 
For  example,  we  believe  that  they  can  pl»y  an  useful  role  in  model  integra¬ 
tion.  When  models  are  integrated  manually,  modelers  use  information  about 
the  assumptions  underlying  components  of  the  models  being  integrated.  If  two 
models  with  conflicting  assumptions  are  being  integrated,  these  conflicts  must 
be  identfied  and  resolved  by  the  modeler.  Systematic  support  for  tasks  such  as 
this  can  be  provided  using  justification  networks.  Another  potential  advantage 
of  our  approach  might  be  realized  in  the  initial  stages  of  model  reuse.  Model 
reuse  begins  with  the  identification  of  candidates  for  reuse  [34].  When  there  are 
several  candidates,  in  the  absence  of  support,  this  can  be  hard.  The  additional 
information  offered  by  the  justifications  could  provide  such  support. 

To  conclude,  we  have  laid  out  a  systematic  approach  for  combining  the 
strengths  of  data  and  mathematical  modeling  languages.  We  have  argued  that 
this  can  significantly  improve  the  functionality  of  model  management  systems 
and  have  demonstrated  some  of  the  benefits.  .Much  more  remains  to  be  done, 
but  we  hope  to  have  raised  issues  for  debate  and  future  research. 
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